78 research outputs found
Ariadne: Analysis for Machine Learning Program
Machine learning has transformed domains like vision and translation, and is
now increasingly used in science, where the correctness of such code is vital.
Python is popular for machine learning, in part because of its wealth of
machine learning libraries, and is felt to make development faster; however,
this dynamic language has less support for error detection at code creation
time than tools like Eclipse. This is especially problematic for machine
learning: given its statistical nature, code with subtle errors may run and
produce results that look plausible but are meaningless. This can vitiate
scientific results. We report on Ariadne: applying a static framework, WALA, to
machine learning code that uses TensorFlow. We have created static analysis for
Python, a type system for tracking tensors---Tensorflow's core data
structures---and a data flow analysis to track their usage. We report on how it
was built and present some early results
Who you gonna call? Analyzing Web Requests in Android Applications
Relying on ubiquitous Internet connectivity, applications on mobile devices
frequently perform web requests during their execution. They fetch data for
users to interact with, invoke remote functionalities, or send user-generated
content or meta-data. These requests collectively reveal common practices of
mobile application development, like what external services are used and how,
and they point to possible negative effects like security and privacy
violations, or impacts on battery life. In this paper, we assess different ways
to analyze what web requests Android applications make. We start by presenting
dynamic data collected from running 20 randomly selected Android applications
and observing their network activity. Next, we present a static analysis tool,
Stringoid, that analyzes string concatenations in Android applications to
estimate constructed URL strings. Using Stringoid, we extract URLs from 30, 000
Android applications, and compare the performance with a simpler constant
extraction analysis. Finally, we present a discussion of the advantages and
limitations of dynamic and static analyses when extracting URLs, as we compare
the data extracted by Stringoid from the same 20 applications with the
dynamically collected data
Statically Checking Web API Requests in JavaScript
Many JavaScript applications perform HTTP requests to web APIs, relying on
the request URL, HTTP method, and request data to be constructed correctly by
string operations. Traditional compile-time error checking, such as calling a
non-existent method in Java, are not available for checking whether such
requests comply with the requirements of a web API. In this paper, we propose
an approach to statically check web API requests in JavaScript. Our approach
first extracts a request's URL string, HTTP method, and the corresponding
request data using an inter-procedural string analysis, and then checks whether
the request conforms to given web API specifications. We evaluated our approach
by checking whether web API requests in JavaScript files mined from GitHub are
consistent or inconsistent with publicly available API specifications. From the
6575 requests in scope, our approach determined whether the request's URL and
HTTP method was consistent or inconsistent with web API specifications with a
precision of 96.0%. Our approach also correctly determined whether extracted
request data was consistent or inconsistent with the data requirements with a
precision of 87.9% for payload data and 99.9% for query data. In a systematic
analysis of the inconsistent cases, we found that many of them were due to
errors in the client code. The here proposed checker can be integrated with
code editors or with continuous integration tools to warn programmers about
code containing potentially erroneous requests.Comment: International Conference on Software Engineering, 201
Opportunities in Software Engineering Research for Web API Consumption
Nowadays, invoking third party code increasingly involves calling web
services via their web APIs, as opposed to the more traditional scenario of
downloading a library and invoking the library's API. However, there are also
new challenges for developers calling these web APIs. In this paper, we
highlight a broad set of these challenges and argue for resulting opportunities
for software engineering research to support developers in consuming web APIs.
We outline two specific research threads in this context: (1) web API
specification curation, which enables us to know the signatures of web APIs,
and (2) static analysis that is capable of extracting URLs, HTTP methods etc.
of web API calls. Furthermore, we present new work on how we combine (1) and
(2) to provide IDE support for application developers consuming web APIs. As
web APIs are used broadly, research in supporting the consumption of web APIs
offers exciting opportunities.Comment: Erik Wittern and Annie Ying are both first author
Static Analysis of Shape in TensorFlow Programs
Machine learning has been widely adopted in diverse science and engineering domains, aided by reusable libraries and quick development patterns. The TensorFlow library is probably the best-known representative of this trend and most users employ the Python API to its powerful back-end. TensorFlow programs are susceptible to several systematic errors, especially in the dynamic typing setting of Python. We present Pythia, a static analysis that tracks the shapes of tensors across Python library calls and warns of several possible mismatches. The key technical aspects are a close modeling of library semantics with respect to tensor shape, and an identification of violations and error-prone patterns. Pythia is powerful enough to statically detect (with 84.62% precision) 11 of the 14 shape-related TensorFlow bugs in the recent Zhang et al. empirical study - an independent slice of real-world bugs
Finding Bugs in Web Applications Using Dynamic Test Generation and Explicit State Model Checking
Web script crashes and malformed dynamically-generated web pages are common errors, and they seriously impact the usability of web applications. Current tools for web-page validation cannot handle the dynamically generated pages that are ubiquitous on today's Internet. We present a dynamic test generation technique for the domain of dynamic web applications. The technique utilizes both combined concrete and symbolic execution and explicit-state model checking. The technique generates tests automatically, runs the tests capturing logical constraints on inputs, and minimizes the conditions on the inputs to failing tests, so that the resulting bug reports are small and useful in finding and fixing the underlying faults. Our tool Apollo implements the technique for the PHP programming language. Apollo generates test inputs for a web application, monitors the application for crashes, and validates that the output conforms to the HTML specification. This paper presents Apollo's algorithms and implementation, and an experimental evaluation that revealed 302 faults in 6 PHP web applications
Finding Bugs In Dynamic Web Applications
Web script crashes and malformed dynamically-generated web pages are common errors, and they seriously impact usability of web applications. Currenttools for web-page validation cannot handle the dynamically-generatedpages that are ubiquitous on today's Internet.In this work, we apply a dynamic test generation technique, based oncombined concrete and symbolic execution, to the domain of dynamic webapplications. The technique generates tests automatically andminimizes the bug-inducing inputs to reduce duplication and to makethe bug reports small and easy to understand and fix.We implemented the technique in Apollo, an automated tool thatfound dozens of bugs in real PHP applications. Apollo generatestest inputs for the web application, monitors the application forcrashes, and validates that the output conforms to the HTMLspecification. This paper presents Apollo's algorithms andimplementation, and an experimental evaluation that revealed a totalof 214 bugs in 4 open-source PHP web applications
LakeBench: Benchmarks for Data Discovery over Data Lakes
Within enterprises, there is a growing need to intelligently navigate data
lakes, specifically focusing on data discovery. Of particular importance to
enterprises is the ability to find related tables in data repositories. These
tables can be unionable, joinable, or subsets of each other. There is a dearth
of benchmarks for these tasks in the public domain, with related work targeting
private datasets. In LakeBench, we develop multiple benchmarks for these tasks
by using the tables that are drawn from a diverse set of data sources such as
government data from CKAN, Socrata, and the European Central Bank. We compare
the performance of 4 publicly available tabular foundational models on these
tasks. None of the existing models had been trained on the data discovery tasks
that we developed for this benchmark; not surprisingly, their performance shows
significant room for improvement. The results suggest that the establishment of
such benchmarks may be useful to the community to build tabular models usable
for data discovery in data lakes
- …